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Abstract 

Testing  of  complex  systems  is  a  fundamentally  difficult  task,  whether  locating  faults 
(diagnostic  testing)  or  implementing  upgrades  (regression  testing).  Branch  paths  through  the 
system  increase  as  a  function  of  the  number  of  components  and  interconnections,  leading  to 
exponential  growth  in  the  number  of  test  cases  for  exhaustive  examination.  In  practice,  the 
typical  cost  for  testing  in  schedule  or  in  budget  means  that  only  a  small  fraction  of  these  paths 
are  investigated.  Given  some  fixed  cost,  then,  which  tests  should  we  execute  to  guarantee  the 
greatest  information  returned  for  the  effort?  In  this  work,  we  develop  an  approach  to  system 
testing  using  an  abstract  model  flexible  enough  to  be  applied  to  both  diagnostic  and  regression 
testing,  grounded  in  a  mathematical  model  suitable  for  rigorous  analysis  and  Monte  Carlo 
simulation.  Early  results  indicate  that  in  many  cases  of  interest,  a  good,  though  not  optimal, 
solution  to  the  fixed-constraint  problem  (how  many  tests  for  budget  x?)  can  be  approached  as  a 
simple  best-next  strategy  (which  test  returns  the  highest  information  per  unit  cost?).  The  goal  of 
this  modeling  work  is  to  construct  a  decision-support  tool  for  the  Navy  Program  Executive  Office 
Integrated  Warfare  Systems  (PEO  IWS)  offering  quantitative  information  about  cost  versus 
diagnostic  certainty  in  system  testing. 

Keywords:  diagnostic  testing,  regression  testing,  automated  testing,  Monte  Carlo 
simulation,  sequential  Bayesian  inference,  knapsack  problem 
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1.  Introduction 


Many  of  us  commute  to  work  every  day  in  what  has  become  a  relatively  complex 
system:  an  automobile.  When  this  system  fails  us,  we  are  forced  to  allocate  resources  (i.e.,  time 
and  money)  to  diagnostic  testing  and  repair  of  one  or  more  components  within  the  system.  The 
budget  for  this  testing  and  repair  is  generally  constrained  by  our  prudence  and  our  pocketbooks; 
we  hope  that  the  service  technician  employs  a  testing  strategy  that  develops  the  best  answer 
(e.g.,  replace  the  alternator)  for  the  least  cost. 

There  are  several  possible  stopping  criteria  for  this  testing.  In  particular,  a  logical  choice 
would  be  to  stop  testing  when  the  cost  of  replacement  of  all  suspect  parts  is  less  than  the  cost 
of  conducting  one  more  test.  This  presupposes,  however,  that  our  system  under  test  is  in  a 
failed  state.  Suppose  we  replace  a  known  defective  part  or  perform  upgrade  maintenance  on  a 
component:  how  much  testing  must  we  accomplish  to  convince  ourselves  that  the  system  will 
operate  correctly  under  all  conditions? 

In  this  paper  we  present  a  language  of  description  and  a  mathematical  model  to 
describe  a  system  under  testing,  with  the  goal  of  evaluating  strategies  in  terms  of  the 
information  returned  by  a  set  of  tests.  In  this  framework,  then,  testing  is  the  mechanism  by 
which  we  trade  some  fixed  cost  (e.g.,  time,  money)  for  information  about  the  state  of 
subcomponents  in  our  system.  In  general,  we  seek  the  maximum  information  available  for  the 
minimum  cost.  In  the  present  study,  we  consider  the  following  question:  Given  a  fixed  budget, 
what  is  the  maximum  information  discoverable  from  a  particular  test  suite? 

Mathematical  models  of  component  and  system  reliability  have  roots  in  the  work  of  von 
Neumann  (1952)  and  Moore  and  Shannon  (1956a;  1956b),  as  well  as  the  seminal  text  by 
Barlow  and  Proschan  (1965).  The  focus  of  these  early  works  is  generally  on  assessing  the 
overall  system  reliability,  particularly  with  regard  to  the  economics  of  preventative  vice  reactive 
maintenance  (e.g.,  see,  Bovaird,  1961).  In  the  present  work,  the  focus  is  on  efficiently  identifying 
either  a  defective-by-design  or  failed  component  in  a  complex  system. 

This  fault  diagnosis  is  sometimes  referred  to  as  the  test-sequencing  problem,  and  has 
also  been  well  studied  (e.g.,  see,  Sobel  &  Groll,  1966;  Garey,  1972;  Fishman,  1990;  Barford, 
Kanevsky  &  Kamas,  2004).  In  general,  these  investigators  start  with  a  system  in  a  known  failed 
state  with  the  goal  of  finding  the  most  cost-effective  sequence  of  diagnostics  to  locate  the  failed 
component  (or  components)  under  a  given  set  of  assumptions. 

In  contrast  to  fault  diagnosis,  the  general  case  of  regression  testing  appears  to  have 
received  less  attention  in  the  open  literature,  with  more  specific  cases  examined  in  the  realm  of 
software  engineering  (e.g.,  Leung  &  White,  1991;  White  &  Leung,  1992;  Weyuker,  1998;  Tsai, 
2001;  Rothermel,  Untch  &  Harrold,  2001;  Mao  &  Lu,  2005).  These  studies  typically  start  with  a 
fully  functioning  system  undergoing  component  modifications  or  upgrades,  with  the  task  of 
establishing  that  component  modifications  have  not  introduced  new  defects  into  the  system. 

In  the  present  study,  we  treat  testing  as  a  unified  activity,  with  risk  and  cost  as  the 
common  tension  regulating  the  degree  of  testing  required.  From  a  fault-diagnosis  perspective, 
we  want  to  arrive  at  a  replacement  or  maintenance  decision  quickly  while  ensuring  the  system  is 
restored  to  perfect  functionality.  From  a  regression  testing  perspective,  particularly  with  the 
open  architectures  employed  within  the  Integrated  Warfare  System,  following  an  engineering 
change  or  upgrade  to  a  component,  we  want  to  conduct  enough  testing  to  verify  that  the  system 
remains  in  perfect  function.  The  element  of  risk  is  that  costs  incurred  for  perfect  knowledge  may 
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approach  infinity,  or  may  not  be  achievable  with  a  given  test  suite.  From  a  practical  perspective, 
then,  we  accept  with  some  level  of  confidence  (e.g.,  99%  certainty,  95%  certainty)  that  our 
diagnosis  or  prognosis  is  correct. 

The  rest  of  this  paper  is  organized  as  follows.  Section  2  presents  the  model  formulation 
and  fundamental  definitions.  Section  3  details  the  mathematical  model  derived  from  this 
framework.  Section  4  outlines  numerical  experiments  examining  testing  strategies  in  terms  of 
this  model  and  presents  simulation  results.  Section  5  discusses  conclusions  and  avenues  for 
future  work. 

2.  Definitions  and  Model  Formulation 

The  growing  use  of  commercial  off-the-shelf  technologies  in  current  weapons  systems 
(Caruso,  1995;  Dalcher,  2000),  coupled  with  the  complexity  of  end-to-end  systems  (Athans, 
1987;  Brazet,  1993),  suggests  that  we  may  never  have  enough  information  to  fully  specify  our 
system  as  a  white  box,  with  all  software,  hardware  and  communication  interfaces  perfectly 
characterized.  Thus,  we  construct  our  model  with  broad  parameters  that  can  be  constrained  as 
narrowly  as  available  information  permits. 

We  characterize  the  model  system  S  as  a  collection  of  modules  and  a  suite  of  tests  used 
to  interrogate  these  modules.  We  examine  the  system  through  this  test  suite  to  identify  defective 
modules  or  to  determine  that  no  defective  modules  exist.  We  assume  that  tests  return 
ambiguous  information  about  the  state  of  modules  within  the  system;  that  is,  no  single  test  is 
likely  to  return  perfect  knowledge  about  a  particular  module. 

Thus,  in  general,  we  expect  that  some  sequence  of  tests  must  be  applied  to  arrive  at  a 
correct  diagnosis,  where  the  term  correct  may  require  careful  definition  in  terms  of  acceptable 
risk  or  required  level  of  confidence.  Stochastic  simulation  of  the  model  system  provides  a 
framework  in  which  different  testing  strategies  may  be  applied  and  measured  for  further  insight. 
Using  this  Monte  Carlo  approach,  we  may  also  test  the  bounds  of  our  initial  assumptions  with 
additional  simulation. 

2.1  System  and  module  objects 

Within  the  system  S,  each  module  M )  represents  the  smallest  diagnostic  or  replaceable 
unit,  which  does  not  necessarily  correspond  to  a  single  physical  component  in  the  modeled 
system.  We  consider,  for  example,  a  motherboard  comprised  of  a  central  processing  unit 
(CPU),  physical  random  access  memory  (RAM),  a  graphics  adapter,  and  keyboard  interface, 
each  of  which  may  cause  the  motherboard  to  fail.  This  might  be  modeled  as  a  single  module 
labeled  Motherboard  if  the  standard  corrective  maintenance  action  is  to  replace  the 
motherboard.  With  more  granular  diagnostics  and  maintenance  practices,  however,  we  might 
model  these  components  as  MB_CPU,  MB_RAM,  MB_Graphics,  and  MB_Keyboard  because 
each  was  testable  and  replaceable. 

Fundamental  to  this  aspect  of  the  model  is  a  source  of  failure  rate  data  for  the  system 
components.  These  failure  rates  become  the  a  priori  data  in  the  larger  probability  model  and  do 
not  necessarily  need  to  be  precise  to  add  value  to  the  iterative  simulation  results.  The  relative 
rates  among  the  modeled  components  (e.g.,  the  Server  module  fails  about  five  times  as  often 
as  the  Router  module)  should  be  close  to  the  observed  data  in  the  physical  system  to  provide 
the  most  realistic  convergence  in  testing  to  a  correct  diagnosis. 
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2.2  Test  objects 


Tests  are  modeled  as  system  of  objects  which,  when  executed,  provide  an  ambiguous 
assessment  of  one  or  more  modules  within  S.  This  ambiguity  stems  from  two  essential 
elements  that  map  the  tractable  model  to  physical  reality. 

The  first  ambiguous  aspect  is  that  any  given  test  likely  exercises  only  a  portion  of  the 
functionality  within  a  module.  Each  /W,  is  modeled  as  a  unit  circle  A  (Figure  1).  Defects,  when 
present,  are  assumed  uniformly  distributed  on  this  circle.  We  assume  that  while  multiple 
modules  may  be  defective,  only  one  defect  exists  per  module.  A  defect  in  /W,  is  modeled  as  a 
random  point  on  A  or  equivalently  a  random  point  on  the  interval  [0,  1].  Although  the  module  is 
the  unit  of  replacement,  we  parameterize  the  sub-module  details  by  treating  them  as  a 
continuous  space  covered,  in  part,  by  a  given  test. 


Figure  1.  The  Simple  Coverage  of  Test  Tx  on  Module  M„ 
Indicated  by  the  Gray  Arc  Aix. 

(The  scalar  measure  of  this  coverage  A(Aix)  =  aix  represents 
the  fraction  of  M,  exercised  by  Tx.) 


We  model  the  coverage  of  test  Tx  on  module  M,  as  the  arc  Aix  (Figure  1 ).  When  Tx  is 
executed,  or  applied  to  the  model  system,  the  arc  Aix  on  Mi  is  inspected  for  a  defect.  Given  the 
assumption  that  defects  appear  uniformly  on  this  unit  circle,  the  probability  that  a  defect  in  /W, 
will  be  detected  by  Tx  is  the  measure  of  this  arc  A(Aix)  =  aix.  The  scalar  probability  of  detection  by 
a  test  is  precisely  this  user-specified  functionality  exercised  by  the  test.  This  element  of  our 
language  of  description  permits  some  ambiguity  in  characterizing  the  physical  system  without 
loss  of  rigor  in  modeling  these  tests  and  modules.  In  practice,  given  a  sufficient  number  of  real- 
world  cases  from  the  physical  system,  this  estimate  for  Aix  could  be  refined  through  analysis  of 
simulation  results. 

The  second  ambiguous  aspect  is  that  any  given  test  likely  covers  multiple  modules,  such 
that  any  test  result  must  be  interpreted  as  applying  to  all  modules  covered  by  that  test  (Figure 
2).  For  example,  a  positive  result  (FAIL)  from  a  diagnostic  test  that  covers  the  modules 
Carburetor,  Distributor  Cap,  and  Spark  Plug  Wiring  indicates  that  at  least  one  of  these  modules 
contains  a  defect  (has  failed),  though  additional  testing  would  be  required  to  identify  which 
module  is  the  culprit.  Because  we  expect  that  a  given  test  exercises  multiple  modules  in  the 
system,  we  speak  more  generally  of  the  coverage  of  Tx  on  S  (Figure  2). 

Within  the  model,  an  executed  test  assumes  one  of  two  values:  PASS  or  FAIL.  A  PASS 
result  for  a  given  test  Tx  indicates  that  no  region  covered  by  this  test  contains  a  defect.  A  FAIL 
result  indicates  that  at  least  one  of  the  modules  covered  by  Tx  contains  a  defect  or  is  BAD  in  the 
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model  definition.  While  a  FAIL  result  should  reduce  the  set  of  modules  that  may  need  to  be 
replaced,  a  perfect  result — replacing  only  failed  modules — will  typically  require  some  sequence 
of  tests.  Indeed,  for  a  particular  configuration  of  tests  and  modules,  this  perfect  result  may  not 
be  achievable.  Analysis  of  simulation  results  should  help  identify  those  cases  in  which  further 
testing  will  yield  no  new  information. 


Figure  2.  Notional  Depiction  of  the  Coverage  of  Tx  on  S, 
with  Multiple  Modules  Exercised  by  This  Test 
(A  FAIL  result  from  Tx  indicates  that  at  least  one 
of  the  subset  {Mi,  Mj,  Mk}  has  failed.) 


The  use  of  vector  arcs  to  model  the  coverage  relationship  between  tests  and  modules 
enables  precision  when  specifying  the  coverage  by  multiple  tests  on  a  single  replaceable  unit 
(Figure  3).  Although  several  tests  in  the  system  suite  may  exercise  a  given  module,  it  is  likely  in 
the  physical  system  that  these  tests  overlap  significantly.  This  language  of  description,  then, 
permits  a  user  specification  of  the  physical  system  in  broad  terms  (e.g.,  the  Remote  Control  test 
and  Obstacle  Detection  test  both  exercise  about  70%  of  the  Garage  Door  Motor  module,  with 
about  20%  overlap  between  the  two  tests).  Even  if  these  data  are  estimated  from  the  physical 
system,  existing  case  data  and  simulation  results  could  be  used  to  provide  better  specification 
of  these  joint  coverages. 
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Figure  3.  Overlapping  Coverage  between  Tests  7xand  Ty 
Are  Characterized  with  the  Arcs  Aix  and  Aiy 
(The  joint  coverage  is  computable  as  the  intersection  of  these  arcs.) 


2.3  Summary 

This  conceptual  model  captures  the  essential  elements  of  a  system  with  respect  to 
testing,  suitable  for  both  diagnostic  and  regression  work.  The  physical  system  is  specified  in 
terms  of  modules,  tests,  and  coverages,  with  model  elements  constructed  in  such  a  way  that 
imperfect  information  can  still  be  used  as  an  initial  state. 

Although  the  model  requires  that  the  physical  system  be  decomposable  into  discrete 
units  of  replacement,  this  does  not  limit  the  usefulness  of  this  approach.  Within  the  system,  we 
expect  different  levels  of  maintenance  (e.g.,  depot  maintenance,  field-level  maintenance)  and 
different  levels  of  diagnostic  techniques,  all  of  which  could  be  treated  as  different  layers  within 
this  framework.  We  next  formalize  these  model  elements  in  mathematical  language  to  construct 
a  suitable  computer  simulation  to  investigate  these  testing  strategies. 

3.  Mathematical  Analysis 

Our  goal  in  this  study  of  system  testing  is  to  maximize  certainty  for  a  given  cost.  In 
developing  a  probability  framework  to  model  this  process,  we  first  form  simple  objective 
measures  to  characterize  knowledge  of  the  system  state.  We  next  examine  a  simple,  step-wise 
strategy  to  predict  a  test  sequence  that  will  maximize  or  minimize  these  measures.  We  then 
compare  this  strategy  to  a  two-test  approach. 

The  motivation  for  examining  a  two-test  (or  /c-test)  strategy  under  fixed  cost  is  that  this 
problem  very  much  resembles  the  classic  knapsack  problem  (Corman,  Leiserson  &  Rivest, 
2002).  Choosing  at  each  step  the  single  test  that  offers  the  largest  increase  in  information  (that 
is,  inserting  the  largest  item  into  the  knapsack  first)  does  not  guarantee  that  we  will,  for  a  fixed 
cost,  achieve  the  greatest  information  gain  (maximize  the  content  of  our  knapsack).  It  would  be 
computationally  advantageous,  though,  if  we  could  demonstrate  that  for  many  cases  of  interest, 
a  simple  best-next  strategy  can  approach  a  k-step  strategy  in  information  return  (Cover  & 
Thomas,  1991). 
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3.1  Module  definitions 


Within  our  system  S,  we  define  B,  and  G,  as  the  events  that  module  Mi  is  bad  or  is  good, 
with  corresponding  probabilities: 


P  (Bi)  =  bi  (3.1) 

P(G, )  =  !-*,. 


Each  bi  represents  information  we  have  about  the  state  of  module  Mi,  and  the  collection 
{£>,}  gives  us  some  insight  into  the  health  of  S.  Prior  to  any  testing,  we  expect  each  b,  is 
initialized  based  on  an  a  priori  failure  rate  for  M,. 

The  probability  b,  is  an  intuitive  measure  of  information,  and  we  see  that  as  b,  tends  to  0 
(good)  or  1  (bad),  our  knowledge  about  /W,  becomes  more  certain.  A  classic,  quantitative 
measure  of  this  knowledge  is  the  information  entropy  (Shannon,  1948): 

hi=-bilog2bi-(\-bi)\og2(l-bi)  (3.2) 


We  see  that  as  b,  tends  to  0  or  1 ,  hi  is  minimized  (Figure  4).  By  applying  tests  from  our 
diagnostic  suite,  we  should  become  more  certain  about  the  state  of  a  module  (good  or  bad)  and 
so  act  to  nudge  b,  to  the  edges  of  the  interval  [0,  1].  We  can  measure  this  improvement  in 
certainty  as  a  reduction  in  the  individual  module  entropy  b„  aggregated  over  the  system: 

n  (3  3) 

i= 1 


Entropy  is  computationally  attractive  as  a  continuous  and  differentiable  function  over  the 
interval  of  interest  (Figure  4),  though  b,  may  be  less  intuitive  when  deciding  which  modules  to 
replace.  A  measure  similar  to  entropy,  though  not  differentiable  at  maximum  entropy,  is: 

qt  =max(b,.,l-b,.)  (3.4) 


We  can  think  of  g,  as  a  quality  gauge  of  this  replacement  (or  maintenance)  decision  with 
respect  to  a  particular  module.  If,  for  example,  a  particular  module  has  a  b,  =  0.70,  we  may 
replace  it  knowing  that  this  informed  guess  should  be  correct  70%  of  the  time.  This  also  means 
that  in  30%  of  these  cases,  we  will  unnecessarily  replace  or  perform  more  granular  debugging 
on  this  module.  Our  number  of  correct  diagnoses  across  the  system  will  increase  as  each  b,  is 
adjusted,  by  testing,  away  from  b,=0.5  towards  either  0  or  1  (Figure  4).  Although  this  is  not  a 
rigorous  result,  it  can  be  shown  that  minimizing  system  entropy  is  approximately  equivalent  to 
maximizing  the  number  of  correct  diagnoses. 
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Module  probability  bj 


Figure  4.  Module  Entropy  h(bi)  and  q,  =  max  (Jb„  ^-b), 
with  Notional  Module  Probability  b,  Indicated 
(The  scalar  5  represents  the  displacement  of  b/  from  maximum  entropy  ( b ,•  =  0.5). 
Note  that  by  symmetry,  hfbj  =  h(1  -  b),  with  distance  25  between  these  states.) 


Consistent  with  previous  studies  (e.g.,  Birnbaum,  Esary  &  Saunders,  1961;  Butterworth, 
1 972;  Ben-Dov,  1 981 ),  we  characterize  our  knowledge  of  S  as  a  vector  of  these  module 
probabilities,  {/?,},  where  each  component  probability  b,  is  on  the  interval  [0,  1].  The  true  state  of 
S  is  a  bit  vector  with  each  component  exactly  0  (good)  or  1  (bad);  in  practice,  we  are  unlikely  to 
achieve  this  “perfect”  knowledge,  but  computation  of  b,  or  g,  permits  some  insight  into  this  true 
state. 


Earlier  studies  typically  examined  scenarios  in  which  S  will  function  only  if  all  modules 
are  working  correctly  (serial  system),  some  modules  are  working  correctly  (k-of-n  system),  or  if 
at  least  one  module  is  working  correctly  (parallel  system).  In  the  present  study,  we  make  no 
assumptions  about  whether  S  is  in  a  known  down  state  and  instead  focus  on  characterizing  the 
health  of  the  system — similar  to  literature  in  optimal  maintenance  strategies  (e.g.,  Boivard, 

1961 ;  or  Barlow  &  Proschan,  1965).  The  focus  in  the  present  work  is  on  the  nature  of  the  test 
suite  available  to  the  diagnostician  or  maintainer,  and  the  most  effective  use  of  that  suite  to 
better  characterize  S.  We  next  present  the  mathematical  model  for  tests  and  testing. 

3.2  Test  definitions 

Similar  to  our  model  of  modules,  we  define  Pxand  Fx as  the  events  that  test  Tx  passes  or 
fails,  respectively.  We  expect  either  result  (pass  or  fail)  to  return  ambiguous  information 
because  a  test  likely  exercises  or  covers  only  some  fraction  of  the  functionality  of  a  module 
(Figure  1),  and  because  the  test  likely  exercises  several  modules  simultaneously  (Figure  2). 
Thus,  a  passing  result  for  Tx  indicates  only  that  no  defect  was  detected,  while  a  failing  result 
narrows  the  pool  of  suspect  modules  to  those  exercised  by  Tx. 

After  execution  of  a  test,  we  update  the  prior  probability  b,  to  the  new  probability  b,’ 
based  on  the  test  outcome: 
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(3.5) 


h'\p(Bi\Px)  if  Tx  passes 
*  jP^  |  Fx)  if  Tx  fails 

Using  Bayes’  rule,  we  can  compute  these  probabilities  as: 


p(3 1  p.) 


p (P,\ 

Si)PA)  

P(P* 

13)1 

p  (P,) 

P(P.)  ] 

m  i  f„) 


p (F,  I  ATO) 

p  (F,) 


m  i  b,) 

p  (F,) 


(3.6) 


(3.7) 


These  results  suggest  that  tests  can  be  seen  as  operators  that  transform  b,  into  b,’.  We 
note  that  the  module  probability  is  unchanged  if  test  Tx  has  no  coverage  on  M,.  In  this  case,  the 
conditional  probabilities  P(PX|B,)  and  P(FX|B,)  degenerate  to  the  unconditional  probabilities  P(PX) 
and  P(FX),  and  so  we  have  b,’  =  b,. 


Because  the  execution  of  test  Tx  necessarily  incurs  some  cost  in  budget  or  schedule,  we 
are  motivated  to  compute  a  forecast  value,  qix,  from  a  weighted  sum  of  the  unconditional 
probabilities  on  Px  and  Fx  (Equation  3.8).  Given  the  prior  state  of  the  system  as  the  set  of 
module  probabilities  {b/},  this  method  can  then  be  used  to  assess  the  expected  change  across 
the  system  for  a  particular  test  Tx. 


Qix 


=  max 


|P  (Bi 
|P(G, 


P  (Px)  +  max 


|P(B,  K)] 

[P(G,  |i7)J 


P  (F,) 


P  (P, 


S,)P(P,:)) 

Gi)P(Gi)) 


+  max 


P(F,  |  B,)P(B,)) 

P (F,  I  G,)P(G,)J 


(3.8) 


Using  the  expected  value  of  b,  after  test  Tx  (Equation  3.8),  we  can  form  a  composite 
measure  over  our  system  of  n  modules  with  the  sum  Q(TX): 


n  (3.9) 

Q(t,)  =  Efa 

i— 1 

We  expect  that  if  Q(TX)  >  Q(Ty),  then  test  Tx  will  return  more  information  than  test  Ty 
(Figure  4).  This  value,  however,  is  a  forecast;  the  actual  information  returned  for  a  particular  test 
execution  may  vary  widely  by  scenario. 

We  note  also  that  because  Q  depends  upon  our  current  knowledge  of  the  system  as  the 
set  of  probabilities  {bi},  and  because  this  set  is  constantly  updated  by  testing,  the  choice  of  a 
particular  test  Tx  may  yield  widely  varying  Q(7X)  depending  upon  when  Tx  is  executed  in  the  test 
sequence. 
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To  calculate  qix  we  need  the  conditional  probability  P(PX|B,),  which  we  compute  by  first 
considering  the  unconditional  probability  P(PX).  We  note  that  a  test  T"x  will  pass  if  every  module 
covered  is  either  good  (Equation  3.1 )  or  bad  but  undetected.  A  test  Tx  will  fail  to  find  a  defect 
with  probability  (1  -  aix),  or  the  complement  to  the  fractional  coverage  of  Tx  on  Mi  (Figure  1). 
Considering  all  n  modules  in  the  system,  then,  we  have: 


HP,) 


n  t1-*.)  +  (i  - 

*=1  [  GOOD  BA  D^NO T  'DETECTED 


n 


I"I[l  ] 

i= 1 


Given  that  module  M-,  is  bad  {b,  =  1),  we  then  have: 


(3.10) 


n 

p(px  i  Bi) = (i  -  on i1  -  ajxbj  \ 

j^i 

Similarly,  we  note  that  P(FX)  =  1  -  P(PX),  and  we  can  then  see  that: 


(3.11) 


n 

m  |  Bt)  =  1  -  (1  -  aix)\[  [l  -  ajxbj 

j^i 


(3.12) 


We  see  that  if  Tx  has  no  coverage  on  Mi  (a/x  =  0),  Equations  3.1 1  and  3.12  reduce  to  the 
unconditional  probabilities  on  Tx.  Similarly,  if  Tx  has  perfect  coverage  (a/x  =  1 ),  Equations  3.1 1 
and  3.12  reduce  to  0  (7X  cannot  pass  if  M,  is  bad)  and  1  ( Tx  must  fail  if  Mj  is  bad). 


Using  Equation  3.10  and  its  complement,  the  conditional  probabilities  given  that  /W,  is 
good  are: 


fl 

m  i  Gi)= nji-V',] 

j^i 

n 

m\G,)  =  l-Y\\l-arxb, 

j^l 

A  quick  check  of  the  boundaries  shows  that  if  Mi  is  good  and  there  are  no  other 
coverages  on  M,  (all  ajx  =  0),  Equation  3.13  reduces  to  1  ( Tx  must  pass),  and  Equation  3.14 
reduces  to  0  ( Tx  cannot  fail).  Indeed,  this  set  of  equations  (3.1 1-3.14)  addresses 
computationally  the  ambiguity  associated  with  test  results,  coverages  and  modules  (Figure  1 
and  Figure  2). 

3.3  Test  strategies 

The  objective  function  Q(TX)  (Equation  3.9)  is  necessarily  a  one-step  method  if  we 
choose  that  Tx  which  maximizes  Q.  If  we  have  only  the  budget  or  schedule  to  execute  one  more 
test,  maximizing  Equation  3.9  will  yield  the  optimal  result.  In  practice,  though,  we  expect  that  we 
may  have  the  resources  to  execute  some  number  of  tests;  and,  similar  to  the  classic  knapsack 


(3.13) 

(3.14) 
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problem  (Corman,  Leiserson  &  Rivest,  2002),  we  are  not  guaranteed  that  this  simple,  one-step 
strategy  will  generally  yield  the  largest  information  gain  for  a  given  cost. 

As  a  simple  example  of  this  knapsack  problem,  consider  three  tests  {T1t  T2,  T3}  with  a 
forecast  information  return  of  {Q(7y)  =  3,  Q(T2)  =  4,  Q(T3)  =  6}  for  associated  cost  {3,  4,  5};  the 
units  of  information  and  cost  are  not  important  to  our  point  and  can  be  thought  of  as  unit  cost 
per  bit.  Given  a  fixed  cost  constraint  of  7,  the  single  best-next  test  choice  is  T3,  with  a  cost-per- 
bit  5/6  and  a  net  return  of  5.  The  choice  of  T3,  though,  means  that  we  cannot  execute  another 
test  within  the  cost  constraint  of  7.  This  strategy,  then,  is  clearly  not  optimal  for  the  fixed 
constraint  because  the  choice  of  T1  and  T2,  each  individually  more  expensive  than  T3,  yields  a 
net  information  return  of  7. 

To  guarantee  an  optimal  solution,  then,  we  are  obligated  to  compute  2k  possible 
outcomes  for  a  suite  of  k  tests  for  all  n  modules  in  S.  To  mitigate  this  computational  burden,  we 
examine  the  real  differences  between  an  optimal  solution  and  a  good  solution  for  our  scenarios 
of  interest. 

For  additional  insight,  we  consider  a  two-step  strategy  and  its  associated  objective 
function  with  four  components  (Equation  3.15),  computed  over  all  modules  (Equation  3.16): 


Qixy 


max 


+  max 


P (P,P,  I  B,)P(B,)1 
P {PxPy  I  G,:)P(Gi) 
P (P,F„  |  B^B,) 
P (P,F,  I  G,;)P(Gf) 


+  max 


+  max 


|P(B,BS 


B,)P(B,) 

G,)P(G.) 

Bi)P(Bi) 

G,)P(G.) 


(3.15) 


Q(T„Ty)  =  £« 

i— 1 


ixy 


(3.16) 


We  note  that  in  these  pair-wise  calculations  (or  in  k-wise  calculations),  we  must  consider 
the  possible  intersection  of  coverages  between  Tx  and  Ty  (Figure  3).  That  is,  we  expect  two 
tests  with  significant  overlap  in  module  coverage  should  not  yield  a  higher  qixy  than  two  paired 
tests  with  similar  fractional  coverages  but  no  overlap  or  intersection  between  the  two  tests. 
Although  the  analytic  work  for  these  conditional  probabilities  follows  Equation  3.1 1 — 3.14,  we 
next  turn  to  simulation  to  exercise  this  strategy  for  comparison  to  the  single-step,  best-next 
strategy. 

3.4  Summary 

We  have  presented  the  mathematical  details  supporting  the  abstract  model  presented  in 
Section  2.  We  characterize  our  knowledge  of  the  system  health  as  a  collection  of  probabilities 
{£>,},  where  b,  indicates  the  probability  that  component  /W,  is  bad  (b,= 1 )  or  good  (b,=0).  Using  a 
sequential  Bayes’  approach,  prior  probabilities  in  {b,}  are  updated  following  test  execution.  As 
more  tests  are  applied,  each  Tx  should  act  to  minimize  entropy  H  (Equation  3.3)  or  increase  our 
certainty  about  the  state  of  each  module  as  either  good  or  bad. 

The  paper  focuses  on  strategies  to  choose  a  test,  or  sequence  of  tests,  to  guarantee  the 
best  (or  at  least  a  very  good)  return  on  the  budget  or  schedule  allocated  to  testing.  In  the 
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present  work,  our  constraint  is  a  given,  fixed  cost  and  our  goal  is  to  find  the  greatest  information 
gain  possible  in  the  test  suite  within  this  cost.  Although  an  optimal  solution  may  often  be 
computationally  untenable,  we  next  examine  simulation  results  to  better  estimate  the  distance 
between  “optimal”  and  “good.” 

4.  Modeling  Approach  and  Simulation  Results 

In  support  of  this  research,  a  desktop  simulation  was  developed  to  implement  the 
analysis  presented  in  Section  3  and  to  further  examine  the  choices  among  test  strategies.  In 
addition  to  the  best-next  and  two-step  strategies,  a  random-strategy  case  was  coded  to  select  a 
test  sequence  randomly,  and  a  pathological  worst-case  strategy  was  created  to  minimize  rather 
than  maximize  the  information  return  per  test.  The  random  and  worst-case  configurations  were 
developed  to  provide  some  contrast  to  the  “best”  strategies. 

4.1  Model  description 

The  simulation  code  implements  object  models  of  Tests  and  Modules,  collected  under  a 
System  object.  Configuration  parameters  are  set  in  an  XML  text  file  (Figure  5).  Each  XML  file 
represents  a  simulation,  which  is  comprised  of  one  or  more  configurations  or  simulation  cases. 
Each  configuration  or  case  is  then  executed  for  some  number  of  trials. 

Within  each  configuration,  the  number  of  modules  and  tests  are  explicitly  set,  while  the 
module  a  priori  failure  rates  and  test-module  coverages  are  established  randomly  within 
minimum  and  maximum  parameters  (Figure  5).  These  random  coverages  are  reconfigurable 
between  trials  to  permit  a  Monte  Carlo  investigation  of  the  initial  data.  While  these  randomized 
scenarios  provide  some  insight  into  systems  testing,  sufficient  flexibility  exists  in  the  computer 
code  and  the  XML  configuration  parameters  to  encompass  more  realistic  systems. 

Because  of  the  iterative  nature  of  the  model,  the  algorithm  should  be  relatively 
insensitive  to  the  initial  conditions  in  the  model  with  respect  to  module  a  priori  failure  rates.  That 
is,  the  state  vector  {bi}  is  constantly  adjusted  through  the  application  of  tests,  and  we  expect  this 
convergence  to  dominate  the  final  or  quasi-steady  state  of  knowledge  regarding  our  system. 


DEFENSE  ACQUISITION  IN  TRANSITION 


-212- 


<?xml  version=" 1 . 0 " ?> 

<simulation> 

<!--  SAMPLE  CONFIGURATION  FILE  --> 

<conf iguration> 

<!--  EXECUTION  PARAMETERS  --> 

<CaseName>best</CaseName> 
<Strategy>best</ Strategy> 


<RandomSeed>-l</RandomSeed> 

<NumberOf Trial s>10</NumberOf Trial s> 
<DecisionThreshold>0 . 90</DecisionThreshold> 
<DefectsPerTrial>l</DefectsPerTrial> 
<LogFileName>simulation . log</LogFileName> 
<Reconf igureTestsPerTrial>yes</Reconf igureTests 
PerTrial> 


< ! --  =================  --> 

< ! —  MODULE  PARAMETERS  — > 
<!__  =================  --> 


<NumberOf Module s>10</NumberOf Module s> 
<FailureRate> 

<Minimum>0 . 5</Minimum> 

<Maximum>0 . 5</Maximum> 

</FailureRate> 

<CostPerModule> 

<Minimum>l . 0</Minimum> 

<Maximum>l . 0</Maximum> 

</CostPerModule> 

<SumCostOf AllModules>100 . 0</ SumCostOf AllModules 
> 

<TestsPerModule> 

<Minimum>l</Minimum> 

<Maximum>5</Maximum> 

</TestsPerModule> 


<i-_  ===============  --> 

< ! —  TEST  PARAMETERS  — > 


<NumberOf Test s>35</NumberOf Test s> 
<CostPerTest> 

<Minimum>l . 00</Minimum> 

<Maximum>l . 00</Maximum> 

</CostPerTest> 

<SumCostOf AllTests>100 . 0</ SumCostOf AllTests> 
<ModulesPerTest> 

<Minimum>l</Minimum> 

<Maximum>3</Maximum> 

</ModulesPerTest> 

<CoveragePerModule> 

<Minimum>0 . 20</Minimum> 

<Maximum>l . 00</Maximum> 

</ CoveragePerModule> 

</ conf iguration> 


<!--  END  OF  CONFIGURATION  — > 


</ simulation> 


Figure  5.  Sample  Configuration  XML  File 


4.2  Model  processing 

Prior  to  the  start  of  a  configuration  run  (set  of  trials),  a  failure  deck  is  created  based  on 
the  relative  failure  rates  of  modules  within  the  system.  Similar  to  a  deck  of  playing  cards, 
modules  appear  in  the  failure  deck  based  on  their  standing  relative  to  the  minimum  failure  rate 
in  the  system;  thus,  if  the  minimum  failure  rate  across  the  system  is  0.2,  a  module  with  a  failure 
rate  of  0.6  will  appear  three  times  within  the  failure  deck.  The  same  deck  is  employed  across  all 
trials  in  a  configuration  run  to  simulate  the  relative  appearance  of  failures  in  a  physical  system. 
No  a  prior  assumption  was  made  in  the  mathematical  analysis  (Section  3)  about  the  number  of, 
and  the  simulation  code  reflects  this  versatility. 

Prior  to  the  start  of  a  trial,  a  test  deck  with  one  entry  for  each  test  is  created  (copied) 
from  the  system  configuration.  Strategies  (best-next,  best-next-two,  random,  and  worst) 
consume  this  list  as  the  “next”  choice  is  executed;  thus,  as  a  test  is  executed  it  is  removed  from 
the  deck,  insuring  that  no  test  will  be  executed  more  than  once  per  trial.  This  also  reduces  the 
search  space  for  the  next  test.  A  new  test  deck  must  be  generated  with  each  trial. 

A  single  trial  is  processed  in  the  following  manner: 

1 .  All  module  b,  are  initialized  from  failure  rate  data. 
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2.  All  module  coverages  are  established;  these  are  either  duplicated  from  the 
previous  trial  or  randomized  subject  to  the  same  configuration  parameters. 

3.  Some  p  number  of  modules  (where  0  <  p  <  n)  are  selected  from  the  failure  deck 
and  a  defect  is  planted  in  each  module.  It  is  possible  (and  interesting)  to  run  the 
simulation  with  no  defects  planted. 

4.  A  test  is  chosen  based  on  a  simple  strategy  (e.g.,  best,  random,  or  worst) 

5.  This  test  is  applied  to  the  system  object 

6.  All  affected  b,  are  updated  based  on  the  outcome  of  (5) 

7.  If  there  is  still  a  test  in  the  test  deck,  return  to  (4) 


Although  we  are  interested  in  improving  test  strategies  under  fixed-cost  constraint,  in 
these  trials  we  simply  execute  until  the  set  of  tests  is  exhausted.  The  motivation  for  this 
approach  is  that  by  allowing  the  simulation  to  process  all  tests  available,  we  also  gain  insight 
into  the  effectiveness  of  the  given  test  suite;  this  line  of  investigation  will  be  pursued  in  future 
studies. 

4.3  Simulation  results 

Using  a  2  GHz  Intel  processor,  a  simulation  of  300  trials  using  the  best-next,  best-next- 
two,  random  and  worst  strategies  required  about  17  minutes  (for  all  four)  using  a  randomized 
configuration  of  40  modules  and  100  tests,  with  one  defect  planted.  The  zero-defect  simulations 
required  about  the  same  execution.  In  all  of  these  cases,  the  initial  probability  distribution  was  to 
set  each  module  to  b,=0.5  (maximum  entropy). 
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Figure  6.  Comparison  of  Mean  g,  among  Simulations  of  a  40-module, 
100-test  System  with  One  Defect  Planted 
(using  the  one-step  (best-next),  two-step  (best-next-two), 
random  and  worst  cases) 


Comparisons  between  the  one-step  and  two-step  approach  (Figure  6)  show  little 
difference  in  the  first  25  or  so  tests  in  terms  of  the  mean  maximum  probability  q , — suggesting 
that  the  one-step  approach  would  yield  an  acceptable  return  on  a  fixed  budget  or  schedule  for 
testing  for  the  randomized  system  configurations  tested.  More  realistic  scenarios  may  show 
more  significant  divergence  between  one-step  and  two-step  (and  by  extension,  /c-step) 
methods. 

Similar  comparisons  in  terms  of  information  entropy  (Figure  7)  show  more  displacement 
between  the  one-step  and  two-step  methods,  though  both  methods  are  clearly  superior  to  the 
random  and  “worst”  case  methods.  In  a  similar  simulation  configuration  but  with  no  defect 
(Figure  8),  the  information  entropy  shows  similar  descent.  At  the  tail  end  of  the  testing  process, 
though,  the  steady-state  H  is  lower  in  the  no-defect  case  (Figure  8)  than  in  the  one-defect 
(Figure  7)  case. 

Although  the  best-next-two  strategy  appears  somewhat  better  than  the  best-next 
strategy  for  the  fixed-cost  constraint,  the  CPU  time  required  for  this  two-step  simulation 
expanded  roughly  by  a  factor  of  four  on  a  per-trial  basis,  which  is  consistent  with  Equation  3.8 
and  Equation  3.15. 


DEFENSE  ACQUISITION  IN  TRANSITION 


-215- 


Mean  Entropy 


1.00000 

0.90000 

0.80000 

0.70000 

0.60000 

0.50000 

0.40000 

0.30000 

0.20000 

0.1 0000 

0  5  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90  95  100 


Test  applied 


Figure  7.  Comparison  of  Mean  Entropy  H  among  Simulations 
of  a  40-module,  100-test  System  with  One  Defect  Planted 
(using  the  one-step  (best-next),  two-step  (best-next  -two), 
random  and  worst  cases) 
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Figure  8.  Comparison  of  Mean  Entropy  H  among  the  NO  DEFECT 
Simulations  of  a  40-module,  100-test  System 
(showing  only  slightly  lower  entropy  values  for  all  cases, 
though  slope  of  descent  is  similar  in  all  cases) 
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5.  Conclusions  and  Future  Work 


In  this  study,  we  have  developed  a  simple,  effective  framework  to  examine  the  testing  of 
complex  systems.  The  idealized  numerical  experiments  demonstrate  that  a  simple  best-next 
approach  is  computationally  efficient  if  not  optimal,  though  the  convergence  to  a  correct 
diagnosis  appears  to  be  very  close  to  a  two-step  approach.  Further  investigation  with  more 
exhaustive  /(-step  testing  strategies  should  confirm  that  for  many  cases  of  interest  the  simple 
best-next  approach  yields  acceptable  results  for  a  fixed-cost  constraint. 

A  novel  aspect  of  this  approach  is  the  focus  on  tests  as  distinct  objects  providing 
information  about  modules.  Much  of  the  previous  work  in  this  area  has  focused  on  knowledge  of 
the  initial  distribution  of  failure  rates,  requiring  almost  perfect  knowledge  of  these  a  priori  data  in 
order  to  be  effective  (Butterworth,  1972;  Ben-Dov,  1981).  In  terms  of  software  testing,  previous 
studies  have  relied  on  some  knowledge  of  the  internal  structure  of  components  or  reusable 
objects,  or  near-perfect  knowledge  of  the  module  interconnections  (Rothermel,  Untch  &  Harrold, 
2001 ;  Mao  &  Lu,  2005).  Here,  we  assume  only  that  our  system  has  some  test  suite;  these  tests 
may  be  derived  from  requirements  documents,  from  formal  system  acceptance  plans,  or  from 
daily  systems  operations.  Application  of  this  model  requires  only  that  we  be  able  to  characterize 
these  tests  in  terms  of  approximate  coverages  on  units  of  replacement. 

Although  a  variable  cost  per  test  is  accounted  for  in  the  simulation  code,  the  runs 
presented  in  this  paper  assumed  a  constant  cost  per  test.  In  effect,  we  assume  a  unit  cost  per 
test,  such  that  the  number  of  tests  becomes  the  associated  cost.  Further  simulation  work  with 
more  realistic  configurations  of  modules,  tests  and  coverages  should  yield  more  insight  into 
operational  diagnostic  and  regression  problems. 

The  treatment  of  tests  as  distinct  objects  readily  enables  use  of  this  approach  for 
simulation  even  when  no  bug  or  defect  is  known  to  be  present.  This  testing  scenario,  which 
commonly  follows  a  major  system  upgrade  or  engineering  change  to  a  system,  is  often  referred 
to  as  regression  testing.  This  zero-defect  case  may  also  be  useful  to  evaluate  the  quality  of  a 
test  suite  with  respect  to  all  states  of  a  system.  In  the  zero-defect  cases  run  in  this  study,  the 
mean  entropy  (Figure  8)  and  maximum  probability  (not  shown)  do  not  go  to  0  and  1 , 
respectively,  as  we  would  expect  if  all  states  were  reachable  from  the  test  suite.  This  line  of 
research  would  be  particularly  well-suited  for  the  classic  regression  or  test-retest  problems. 
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Overview 


•  This  project  seeks  to  provide  a  prototype  decision  aid 
to  help  control  the  cost  of  testing  in  an  open 
architecture  (OA)  environment 

•  Implementation  of  OA  can  lead  to  more  rapid  fielding 
of  increments  in  systems  development 

—  However,  frequent  fielding  requires  frequent  testing 

>  •  This  is  one  of  two  efforts  funded  by  PEO-IWS  7  that 
seek  to  provide  a  rigorous  basis  for  controlling 
spiraling  cost  of  testing 
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Model  approach 


•  Classic  approaches  to  optimal  testing  focus  on  the 
modules  or  components  to  be  tested 

•  This  is  similar  to  building  optimal  search 
strategies  for  submarines  using  only  aircraft  as 
the  reconnaissance  platform,  focusing  on  the 
differences  among  submarines 

i  •  While  it’s  important  to  understand  the 
i  components  being  tested,  or  the  targets  of 
our  search,  this  can  only  take  us  so  far — 

|1  and  sometimes  our  targets  are  black 
boxes 
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Model  approach 


•  In  the  present  work,  we  treat  both  tests  and  components 
explicitly,  using  prior  knowledge  of  both  our  system  and  its 
diagnostic  test  suite  to  build  an  optimal  test  strategy 


•  This  is  similar  to  looking  at  all  available  platforms  for  the  best 
mix  of  sensors  {tests)  to  match  the  most  probable  or  most 
lethal  targets  (faulty  components ) 
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Model  fundamentals 


A  module  is  modeled  as  a 
unit  circle  with  probability 
of  being  defective  bt 

Test  Tx  exercises  region  Aix 
in  module 

In  general  we  assume  that  Tx 
may  exercise  several  regions 
across  several  modules 


A  test  has  two  possible  outcomes: 

-  PASS  indicates  that  the  test  did  not  detect  a  defect  in  any  of  the 
exercised  regions  within  the  modules  tested 

-  FAIL  indicates  that  at  least  one  module  exercised  is  defective, 
though  we  may  not  know  which  one 
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Model  fundamentals 


mill 
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51 


These  ambiguities  offer  a  rich  framework  for  modeling  realistic 
system  testing  scenarios 

-  We  do  not  need  to  execute  (and  pay  for)  Tx  to  forecast  the  information 
returned  by  this  test 

-  Within  this  language  of  expression  we  can  formulate  a  quantitative 
assessment  of  the  information  returned  by  a  test  sequence 

Across  the  system  of  modules  M;  we  can  measure  the 
information  returned  by  a  test  using  the  classic  residual  entropy 
for  a  distribution  of  probabilities: 

H  =  £ ' h,  =  £  ~b,  log2  b, -  (1  ■ -  b, )  log2  (1  -  b, ) 

i  i 
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Model  fundamentals 
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At  maximum  entropy  we  have  a  50/50  chance  that  our 
module  is  good  or  bad — we  might  as  well  flip  a  coin 
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Model  fundamentals 


•  From  entropy,  we  derive  the  forecast  measure: 

Q(TX )  =  (max(Z>/“/ ,  1  -  b[a‘l )P(TX  fails)  +  max(/?/'1'55 , 1  - bfass )P(TX  passes)) 

i 

•  Let  cx  be  the  cost  of  executing  test  Tx  in  appropriate  units  of 
time  or  money  (or  both)  A  good  strategy  will  sequence  the 
suite  of  tests  such  that: 

I 

Ijji  C[l]  C[2]  C[m] 

•  These  ratios  represent  information  per  unit  cost 


WWW.NPS.EDU 


9 


NAVAL 
POSTGRADUATE 
SCHOOL 


Model  implementation 


•  A  prototype  decision  aid  was  crafted  from  this  mathematical 
model  for  desktop  simulation 

-  Development  in  platform-independent,  compact  Java 

-  Configuration  files  and  simulation  output  maintained  as  well-formed 
XML  files  for  experimentation  and  analysis 
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Within  the  simulated  system,  zero  or  more  defects  can  be 
planted  within  the  set  of  modules 

-  With  planted  defects,  we  can  examine  the  best  test  sequences  to  isolate 
faults  in  a  system  down  for  repair 

-  With  zero  defect  runs,  we  can  examine  the  information  return  on  a  test 
suite  for  use  in  regression  or  post-maintenance  analysis  to  verify  that  the 
system  is  mission  capable 
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Model  implementation 


•  Within  the  decision  aid,  for  simple 
investigations,  a  fully  randomized 
system  can  be  created  with  only  a  few 
user  specified  constraints 


If  the  user  has  a  few  system  details  but 
only  vague  insight  about  others,  these 
aspects  can  be  augmented  with 
randomized  parameters  (e.g.  sizes  and 
number  of  coverages) 


•  A  system  with  well-documented 
interdependencies  can  be  completely 
specified  by  the  user  in  terms  of 
modules,  tests  and  coverages 


Precision  of 
system  specification 
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Preliminary  results 
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Comparison  of  300  trials  on  a  generic 
system  simulated  using  a  best-next  and 
bejst-next  two  test  strategies. 

random  test  selection  strategy,; ana  an 
information-minimizing  worst  case 
strategy  are  shown  for  contrast. 

Note  that  even  after  all  tests  are  execlited 
we  are  not  100%  certain  of  our  system 
erformance 
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Number  of  tests  applied 
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Probability  Nodule  is  bad 
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Defects  planted  in  both  Module 
11  and  Module  19 
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A  no-defect  run  could  be  used  to  assess 
the  power  of  a  regression  test  suite  for  OA 
upgrades 
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But,  what  does  it  all  mean? 


•  Effective,  cost-efficient  testing  is  critical  to  the  long-term 
success  of  Open  Architecture 


s>  ■  H 

3#  iifiil 

il 

% 


This  model  and  prototype  decision  aid  provide  a  rigorous  yet 
tractable  way  ahead  to  improve  system  testing 

-  And,  to  better  understand  and  document  the  system  and  component 
interdependencies  across  the  enterprise 

Using  this  framework  we  can  build  the  tools  to: 

-  Lower  the  testing  costs  for  a  given  level  of  system  reliability 

-  Improve  the  use  of  existing  suites  for  a  given  budget  or  schedule 

-  Design  better,  more  targeted  test  suites  to  minimize  redundancy 

-  Provide  insight  into  the  power  or  sensitivity  of  current  test  suites 
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Future  work 


To  further  refine  the  current  prototype  into  an  operational 
capability  will  require  time  and  effort,  notionally: 

-  Three  months  to  work  with  subject  matter  experts  in  simulating  real- 
world  cases  from  the  OA  community 

-  Six  months  to  improve  the  user  interface  and  tune  the  system 
specification  software  to  meet  operational  requirements 

-  Three  months  for  user  training  and  documentation  updates 
>  This  schedule  only  works  if  we  have  the  OA  test  cases 
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